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Abstract 

Computer vision approaches, such as deep learning, potentially offer a range of benefits to entomology, 
particularly for the image-based identification of taxa. An experiment was conducted to gauge the ability 
of a convolution neural network (CNN) to identify genera of Braconidae from images of forewings. A 
deep learning CNN was trained via transfer learning from a small set of 488 images for 57 genera. Three- 
fold cross-validation achieved an accuracy of 96.7%, thus demonstrating that identification to genus using 
forewings is highly predictive. Further work is needed to increase both the coverage to species level and 


the number of images available. 
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Introduction 


Insect populations are challenging to study. One of the main problems is the iden- 
tification of species, particularly in hyper diverse groups such as Hymenoptera, and 
because knowledge of biodiversity around the world is uneven (Amano and Sutherland 
2013; Hoye et al. 2020). However, advances in computer vision approaches provide 
potential new solutions to this global challenge (Hoye et al. 2020; Greeff et al. 2022). 
Computer vision approaches, such as machine learning and deep learning, are currently 
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influencing a wide range of scientific disciplines but are only relatively recently being 
applied to entomology (Boer and Vos 2018; Marques et al. 2018; Hansen et al. 2019). 

Recent studies on image-based insect identification are showing that deep learning 
models can extract features from images and learn to differentiate species to an accuracy 
approaching, or exceeding, human expertise (Valan et al. 2019; Hoye et al. 2020). For 
example, over half of British ground beetles (Carabidae) can be identified to species, 
and 74% to genus using convolutional neural networks trained on an image set of over 
19,000 images (Hansen et al. 2019). Boer and Vos (2018) used over 10,000 images from 
AntWeb (www.antweb.org) (to classify ants at species level based on dorsal, head, and 
profile images. Accuracy of identification was between 62—92% for species and 79-95% 
for genus, depending on different configurations of the models. Marques et al. (2018) 
also examined the classification of ants, and achieved an accuracy of 80-90%, demon- 
strating that high confidence and robustness in ant genera identification can be achieved. 

Further to identification and diagnostics, the use of images is also being combined 
with additional automation and/or robotics to undertake sampling in the field, routine 
laboratory sample processing, or extracting data from images (Arje et al. 2020). For ex- 
ample, Bjerge et al. (2021) have developed an automated light trap to monitor moths 
and identify the species using computer vision-based tracking and deep learning. An 
automated field trial over 48 nights captured more than 250,000 images, an average of 
5675 images per night, with a high validation score for the identification of the 8 most 
common moth species. Machine learning methods have been used to automate the 
extraction of data on insect herbivore damage from plant specimens in museums, in- 
cluding the ability to identify different types of herbivores (Meineke and Davies 2018). 

In this paper, we test the ability of a convolutional neural network to classify gen- 
era of Braconidae that are present in New Zealand using images of the forewing. 


Methods 


Specimens 


All specimens are from the New Zealand Arthropod Collection (NZAC), where the fam- 
ily Braconidae is well curated with almost all specimens (~ 18,000) sorted to at least genus 
level. However, relatively few endemic or native species have been described (Berry 2010). 

Pinned specimens were selected that represent genera of Braconidae which have 
been recorded from New Zealand. ‘This includes genera which are either endemic (re- 
stricted to New Zealand); native (in New Zealand but also naturally occur elsewhere); 
have been accidentally introduced through human trade; or intentionally introduced 
for biological control. 

Taxa (and the number of images) are: Aleoides (10); Alysia manducator (Panzer, 
1799) (10); Apanteles (12); Aphaereta aotea Hughes & Woolcock, 1976 (8); Aphidius 
colemani Viereck, 1912 (11); Ascogaster elongata Lyle, 1923 (10); Asobara persimilis 
(Papp, 1977) (10); Aspicolpus (10); Aspilota parecur Berry, 2007 (8); Austrohormius (10); 
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Bracon phylacteophagus Austin, 1989 (4); Bracon variegator Spinola, 1808 (10); 
Caenophanes sp5 (11); Choeras helespas Walker, 1996 (9); Chorebus rodericki Berry, 2007 
(10); Cotesia (10); Cryptoxilos thorpei Shaw & Berry, 2005 (10); Dacnusa areolaris (Nees, 
1811) (10); Diaeretiella rapae (McIntosh, 1855) (10); Dinocampus coccinellae (Schrank, 
1802) (10); Dinotrema longworthi Berry, 2007 (10); Diolcogaster (10); Dolichogenidea 
tasmanica (Cameron, 1912) (10); Doryctomorpha antipoda Ashmead, 1900 (10); Eadya 
daenerys Ridenbaugh, 2018 (2); Eubazus (10); Glyptapanteles (10); Habrobracon hebetor 
(Say, 1836) (10); Kauriphanes (6); Kiwigaster variabilis Fernandez-Triana & Ward, 
2011 (9); Lysiphlebus testaceipes (Cresson, 1880) (5); Macrocentrus rubromaculata 
(Cameron, 1901) (10); Metaspathius (7); Meteorus pulchricornis (Wesmael, 1835) (9); 
Microctonus hyperodae Loan, 1974 (9); Microplitis (10); Monolexis fuscicornis Forster, 
1862 (3); Neptihormius (10); Notogaster charlesi Fernandez-Triana & Ward, 2020 (10); 
Ontsira antica (Wollaston, 1858) (10); Opius sp2 (10); Pauesia nigrovaria (Provancher, 
1888) (8); Pholetesor (5); Pronkia sp4 (9); Pseudosyngaster pallidus (Gourlay, 1928) 
(10); Rasivalva (2); Rhyssaloides (9); Sathon sp (7); Schauinslandia (10); Shireplitis 
bilboi Fernandez-Triana & Ward, 2013 (2); Shireplitis frodoi Fernandez-Triana & 
Ward, 2013 (3); Spathius exarator (Linnaeus, 1758) (10); Syntretus (10); Taphaeus (10); 
Therophilus (5); Trioxys (10); Venanides (10); and Xynobius (10). 

Some genera were not included because they are wingless, have very reduced wings, 
or there was an insufficient number of specimens. 

An attempt was made to get 10 specimens from each genus. However, this was 
not always possible. The average number of forewings removed from a genus was 8.6 
(range 5—12, median 10). To remove wings, a specimen was placed in a specimen 
manipulator and a micropin was used to gently move the tegula up and down until 
the forewing fell off. Wings were not ‘pulled’ because the membrane rips easily. Static 
electricity meant the wing stuck to the micropin and forceps, making it easy to put 
into a gelatin capsule. After all wings had been removed, wings were slide mounted 
with Euparal. 

The specimen records, all images (zip folder), and one representative image of each ge- 
nus are freely available via the datastore repository (https://doi.org/ 10.793 1/xftx-6w25). 


Imaging and image preparation 


Images of the slide mounted wings were taken on a Nikon MZS25 scope with a Nikon 
DS-Ri2 camera (16.25 megapixels). There was no photo stacking. Images were cropped 
and edited using Adobe Photoshop (Fig. 1A). 

The following pre-processing corrections were applied to each image (Fig. 1B) 
so that the convolutional neural network focused on diagnostic features and not on 
irrelevant differences between the images (such as aspect ratios, colour balance, etc.): 
1) colour converted to grayscale; 2) blurred to reduce image grain noise; 3) brightness 
and contrast standardised (to mean = 0.5, contrast range = -2 standard deviations to 
+ 2 standard deviations, with the extreme values clipped); 4) aspect ‘squashed’ to be a 
square and down sampled to 299 by 299 pixels to match the network input filter size. 
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Figure |. Examples of images: slide mounted forewing (left) and image with pre-processing corrections (right). 


A few images were excluded from analyses as they had become ripped during the 
slide mounting process or were deemed poor quality (colouration, debris on wing) 
which was not spotted when wings were initially removed. 


Model training and validation 


Transfer learning was used to train an Xception network that had been initially trained 
on the Imagenet image set (www.image-net.org). The total number of images were 
split into three sets (folds) of 2/3 train, 1/3 test, via stratified round-robin cross-vali- 
dation. The fully connected classification layers were trained for 200 epochs, followed 
by a further 200 epochs fine-tuning of all parameters. The learning rate was fixed at 
0.0001 and the ADAM optimiser used to automatically adjust the update magnitude; 
this scheme resulted in a very smooth learning curve for this dataset that plateaued at 
around 200 epochs, reducing the need for validation sets to determine the optimal 
cut-off. Images were randomly augmented during training to reduce the chance of 
overfitting and to allow for variations in image conditions that may arise in future 
cases. Augmentation was conservative because the images were quite highly standard- 
ised. The augmentations used were (randomly shift the image up to 10% horizontally 
and vertically; randomly zoom the image up to +/-10%; randomly rotate the image up 
to +/-25 degrees). 


Results and discussion 


A total of 488 wings were used representing 57 genera. Results from cross-validation 
gave an overall accuracy of 96.7% (472/488; Table 1). Of the 16 misclassified images, 
14 images had low confidence scores (<0.9), indicating the network struggled to clas- 
sify them (Table 2), many of these are from the subfamily Microgastrinae. 
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Table |. Accuracy of cross-validation runs on correct predictions to genus. 


Cross-validation Number of correct images / Total images Percent accuracy 


| 182/188 96.81% 
152/156 97.44% 
3 138/144 95.83% 


Table 2. List of errors where the correctly identified image was incorrectly predicted. Scores represent the 
confidence of the model that the prediction is correct. Sorted by highest score. 


Catalog number Correct Predicted Confidence score 
NZAC02012114 Glyptapanteles Dolichogenidea 0.997 
NZAC02011921 Doryctomorpha Caenophanes 0.964 
NZAC02012115 Glyptapanteles Sathon 0.765 
NZAC02011668 Aphaereta Asobara 0.65 
NZAC02012085 Shireplitis Venanides 0.649 
NZAC02012113 Glyptapanteles Dolichogenidea 0.597 
NZAC02012063 Pholetesor Sathon 0.567 
NZAC02012084 Shireplitis Venanides 0.56 
NZAC02012039 Sathon Glyptapanteles 0.545 
NZAC02011790 Caenophanes Doryctomorpha 0.535 
NZAC02011933 Neptihormius Metaspathius 0.525 
NZAC02012117 Glyptapanteles Dolichogenidea 0.508 
NZAC02011792 Caenophanes Doryctomorpha 0.497 
NZAC02012038 Sathon Shireplitis 0.471 
NZAC02012088 Shireplitis Venanides 0.395 
NZAC02011984 Aleoides Doryctomorpha 0.293 


This small experiment demonstrated that forewings appear to be highly predictive 
of genus level identifications. The model accuracy is particularly impressive given the 
very small number of images. Often hundreds or even thousands of images are needed 
to build these models. For example, Hansen et al. (2019) had a set of over 19,000 im- 
ages for ground beetles (Carabidae), and Boer and Vos (2018) used over 10,000 images 
from AntWeb. We suggest our trial was successful because the forewing morphology 
(veins/cells) are already recognised as key diagnostic characters for Braconidae, and the 
images of a forewing are quite simple with considerably less ‘noise’ than dorsal and 
lateral habitus images of an insect body (Valan et al. 2019). 

Two main questions need to be addressed in future work. Firstly, how well does 
only one species (or morphospecies) represent a genus. Several of the genera above 
are monotypic, and for some genera the forewing morphology will differ very little 
between species, but for genera with higher species diversity this condition is unlikely 
to hold. However, this was an initial trial of the technology, and as the number of 
species-level image sets increases then genus-level identification becomes less relevant. 
Secondly, how well will the model perform when additional species or genera are add- 
ed. An increase in the number of ‘classes’ (taxa) will likely increase the morphological 
variability in the dataset, perhaps affecting model accuracy and consequently needing 
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more source images to overcome (Greener et al. 2021). A related issue is the level of 
image standardisation required. The images used in this study were all photographed 
and processed with the same equipment setup; adding subsequent imagery (either for 
the same taxa or novel ones) has the potential to cause the model to focus on spurious 
photographic differences during training. Similarly, the model has only been tested on 
images held out from the same set; how well it performs on other image sets (for the 
same taxa) needs to be tested to determine how well the model ‘transfers’ to novel image 
sets and whether further refinement of the training process is required, such as more 
ageressive image pre-processing and augmentation or the inclusion of more images. 

Machine learning tools, particularly convolutional neural networks (CNNs), are 
fast becoming a valuable tool for the identification of insects (Valan et al. 2019; Hoye 
et al. 2020). Identifications are a vital part of making insects visible and accessible 
(Greeff et al. 2022). An increase in the number and level of taxa being identified offers 
many benefits, including accelerating the discovery and increasing the awareness of a 
greater proportion of biodiversity, providing informed information for applications 
such as other academic research, conservation, and biosecurity, and may free time for 
more research tasks. 

At present, the major hurdle is the shortage of images (Valan et al. 2019; Greeff et 
al. 2022), particularly many images of the same species, rather than just one representa- 
tive photo for a publication. Digitization efforts are underway in many countries that 
involve taking images of specimens, and large image libraries are available such as those 
on iNaturalist and for specific taxa (e.g., Antweb, www.antweb.org), however, these will 
not always cover, or be suitable for, every taxonomic group. Although the above model 
has been built for use in New Zealand, from a distinct set of genera of which several are 
endemic, the images from this project could be used for Braconidae in other countries 
or regions, albeit with very careful interpretation of the results. Consequently, it is vital 
that researchers facilitate sharing and exchange of their images (Valan et al. 2019), and 
that collaborative and user-friendly software be developed (Greeff et al. 2022). 
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